Lechang LIU Takayasu SAKURAI Makoto TAKAMIYA
A 315 MHz power-gated ultra low power transceiver for wireless sensor network is developed in 40 nm CMOS. The developed transceiver features an injection-locked frequency multiplier for carrier generation and a power-gated low noise amplifier with current second-reuse technique for receiver front-end. The injection-locked frequency multiplier implements frequency multiplication by edge-combining and thereby achieves 11 µW power consumption at 315 MHz. The proposed low noise amplifier achieves the lowest power consumption of 8.4 µW with 7.9 dB noise figure and 20.5 dB gain in state-of-the-art designs.
Shuta TOGASHI Takashi OHSAWA Tetsuo ENDOH
In this paper, we propose a new low power nonvolatile counter unit based on Magnetic Tunnel Junction (MTJ) with fine-grained power gating. The proposed counter unit consists of only a single latch with two MTJs. We verify the basic operation and estimate the power consumption of the proposed counter unit. The operating power consumption of the proposed nonvolatile counter unit is smaller than the conventional one below 140 kHz. The power of the proposed unit is 74.6% smaller than the conventional one at low frequency.
Kyung-Chang RYOO Jeong-Hoon OH Sunghun JUNG Hyungjin KIM Byung-Gook PARK
Effects of conductive defects on unipolar resistive random access memory (RRAM) are investigated in order to reduce the operation current for high density and low power RRAM applications. It is clarified that forming voltage decreases with increasing charged conductive defects which are a source of conductive filament (CF) path and with decreasing cell thickness. Random circuit breaker (RCB) network simulation model which is a dynamic percolation simulation model is used to elucidate these effects. From this simulation results, the optimal cell thickness with sufficient conductive defect shows improved resistive switching characteristics such as low forming voltage, small set voltage distribution and low reset current. From the deep understanding of relationship between conductive defect in various cell thickness and other resistive switching parameters, RRAM with low forming voltage and reset current can be obtained and it will be one of the most promising next generation nonvolatile memories.
We have investigated the low-power circuit applicability of hetero-gate-dielectric tunneling field-effect transistors (HG TFETs). Based on the device-level comparison of HG, SiO2-only and high-k-only TFETs, their circuit performance and energy consumption have been discussed. It has been shown that HG TFETs can deliver 14400x higher performance than the SiO2-only TFETs and 17x higher performance than the high-k-only TFETs due to its higher on current and lower capacitance at the same static power, same power supply. It has been revealed that HG TFETs have better voltage scalability than the others. It is because HG TFETs dissipate only 8% of energy consumption of SiO2-only TFETs and 17% of that of high-k-only TFETs under the same performance condition.
The Viterbi algorithm is widely used for decoding of the convolutional codes. The trace-back method is preferable to the register exchange method because of lower power consumption especially for convolutional codes with many states. A drawback of the conventional trace-back is that it generally requires long latency to obtain the decoded data. In this paper, a method of the trace-back with source states instead of decision bits is proposed which reduces the number of memory accesses. The dedicated memory is also presented which supports the proposed trace-back method. The reduced memory accesses result in smaller power consumption and a shorer decode latency than the conventional method.
Hirofumi IWATO Keishi SAKANUSHI Yoshinori TAKEUCHI Masaharu IMAI
To measure the detrusor pressure for diagnosing lower urinary tract symptoms, we designed a small-area and low-power System on a Chip (SoC). The SoC should be small and low power because it is encapsulated in tiny air-tight capsules which are simultaneously inserted in the urinary bladder and rectum for several days. Since the SoC is also required to be programmable, we designed an Application Specific Instruction set Processor (ASIP) for pressure measurement and wireless communication, and implemented almost required functions on the ASIP. The SoC was fabricated using a 0.18 µm CMOS mixed-signal process and the chip size is 2.5 2.5 mm2. Evaluation results show that the power consumption of the SoC is 93.5 µW, and that it can operate the capsule for seven days with a tiny battery.
Mohammad Reza RESHADINEZHAD Mohammad Hossein MOAIYERI Kaivan NAVI
The reduction in the gate length of the current devices to 65 nm causes their I-V characteristics to depart from the traditional MOSFETs. As a result, manufacturing of new efficient devices in nanoscale is inevitable. The fundamental properties of the metallic and semi-conducting carbon Nanotubes (CNTs) make them alternatives to the conventional silicon-based devices. In this paper an ultra high-speed and energy-efficient full adder is proposed, using Carbon Nanotube Field Effect Transistor (CNFET) in nanoscale. Extensive simulation results using HSPICE are reported to show that the proposed adder consumes lower power, and is faster compared to the previous adders.
Ce LI Yiping DONG Takahiro WATANABE
An FPGA plays an essential role in industrial products due to its fast, stable and flexible features. But the power consumption of FPGAs used in portable devices is one of critical issues. Top-down hierarchical design method is commonly used in both ASIC and FPGA design. But, in the case where plural modules are integrated in an FPGA and some of them might be in sleep-mode, current FPGA architecture cannot be fully effective. In this paper, coarse-grained power gating FPGA architecture is proposed where a whole area of an FPGA is partitioned into several regions and power supply is controlled for each region, so that modules in sleep mode can be effectively power-off. We also propose a region oriented FPGA placement algorithm fitted to this user's hierarchical design based on VPR [1]. Simulation results show that this proposed method could reduce power consumption of FPGA by 38% on average by setting unused modules or regions in sleep mode.
Hong-Yi HUANG Shiun-Dian JAN Yang CHOU Cheng-Yu CHEN
The charge-redistribution low-swing differential logic (CLDL) circuits are presented in this work. It can implement a complex function in a single gate. The CLDL circuits utilizes the charge-redistribution and reduced-swing schemes to reduce the power dissipation and enhance the operation speed. In addition, a pipeline structure is formed by a series connection structure controlled by a true-single-phase clock, thereby achieving high-speed operation. The CLDL circuits perform more than 25% speedup and 31% in power-delay product compared to other differential circuits with true-single-phase clock. A pipelined multiplier-accumulator (MAC) using CLDL structure is fabricated in 0.35 µm single-poly four-metal CMOS process. The test chip is successfully verified to operate at 900-MHz.
Ce LI Yiping DONG Takahiro WATANABE
Since the power consumption of FPGA is larger than that of ASIC under the condition to perform the same function using the same scaling, the application of FPGA is limited especially in portable electronic devices. In this paper, we propose a novel low-power FPGA architecture based on coarse-grained power gating to reduce power consumption. The new placement algorithm and routing resource graph for sleep regions is also presented. After enhancing the CAD framework, a detailed discussion is given under different region size supported by the new FPGA architecture. As a result, our proposed FPGA architecture combined with the new placement and routing algorithm can reduce 19.4% in the total power consumption compared with the traditional FPGA. By using our proposed method, FPGA is promising to be widely applied to portable devices.
This paper proposes a high-efficiency on-chip DC-DC down-conversion technique using selectable supply-voltage charge-recycling. This technique converts an external high supply-voltage (2VDD) to an on-chip low supply-voltage (VDD) by using charge-recycling. It partitions the original logic using VDD into high logic (H-logic) and low logic (L-logic), consuming nearly the same amount of power. The H-logic uses a higher supply-voltage (2VDD and VDD). The L-logic uses a lower supply-voltage (VDD and ground). The charge used in the H-logic is recycled in the L-logic. In order to reduce a charge mismatch between the H-logic and the L-logic, this scheme dynamically changes the ratio between the H-logic and the L-logic by selecting the supply-voltages used by the divided logic blocks. To verify the DC-DC down-conversion using the proposed charge-recycling scheme, a test chip was fabricated using a 0.35 µm CMOS technology. Its power efficiency was measured at 93%.
Toshihiro KONISHI Shintaro IZUMI Koh TSURUDA Hyeokjong LEE Takashi TAKEUCHI Masahiko YOSHIMOTO Hiroshi KAWAGUCHI
Concomitantly with the progress of wireless communications, cognitive radio has attracted attention as a solution for depleted frequency bands. Cognitive radio is suitable for wireless sensor networks because it reduces collisions and thereby achieves energy-efficient communication. To make cognitive radio practical, we propose a low-power multi-resolution spectrum sensing (MRSS) architecture that has flexibility in sensing frequency bands. The conventional MRSS scheme consumes much power and can be adapted only slightly to process scaling because it comprises analog circuits. In contrast, the proposed architecture carries out signal processing in a digital domain and can detect occupied frequency bands at multiple resolutions and with low power. Our digital MRSS module can be implemented in 180-nm and 65-nm CMOS processes using Verilog-HDL. We confirmed that the processes respectively dissipate 9.97 mW and 3.45 mW.
Byeong-Woo KOO Seung-Jae PARK Gil-Cho AHN Seung-Hoon LEE
This work describes a 12-bit 100 MS/s 0.13 µm CMOS three-stage pipeline ADC with various circuit design techniques to reduce power and die area. Digitally controlled timing delay and gate-bootstrapping circuits improve the linearity and sampling time mismatch of the SHA-free input network composed of an MDAC and a FLASH ADC. A single two-stage switched op-amp is shared between adjacent MDACs without MOS series switches and memory effects by employing two separate NMOS input pairs based on slightly overlapped switching clocks. The interpolation, open-loop offset sampling, and two-step reference selection schemes for a back-end 6-bit flash ADC reduce both power consumption and chip area drastically compared to the conventional 6-bit flash ADCs. The prototype ADC in a 0.13 µm CMOS process demonstrates measured differential and integral non-linearities within 0.44LSB and 1.54LSB, respectively. The ADC shows a maximum SNDR and SFDR of 60.5 dB and 71.2 dB at 100 MS/s, respectively. The ADC with an active die area of 0.92 mm2 consumes 19 mW at 100 MS/s from a 1.0 V supply. The measured FOM is 0.22 pJ/conversion-step.
Jiongyao YE Yu WAN Takahiro WATANABE
Modern microprocessors employ caches to bridge the great speed variance between a main memory and a central processing unit, but these caches consume a larger and larger proportion of the total power consumption. In fact, many values in a processor rarely need the full-bit dynamic range supported by a cache. The narrow-width value occupies a large portion of the cache access and storage. In view of these observations, this paper proposes an Adaptive Various-width Data Cache (AVDC) to reduce the power consumption in a cache, which exploits the popularity of narrow-width value stored in the cache. In AVDC, the data storage unit consists of three sub-arrays to store data of different widths. When high sub-arrays are not used, they are closed to save its dynamic and static power consumption through the modified high-bit SRAM cell. The main advantages of AVDC are: 1) Both the dynamic and static power consumption can be reduced. 2) Low power consumption is achieved by the modification of the data storage unit with less hardware modification. 3) We exploit the redundancy of narrow-width values instead of compressed values, thus cache access latency does not increase. Experimental results using SPEC 2000 benchmarks show that our proposed AVDC can reduce the power consumption, by 34.83% for dynamic power saving and by 42.87% for static power saving on average, compared with a cache without AVDC.
Sung-Sun CHOI Han-Yeol YU Yong-Hoon KIM
This paper presents a current-reused quadrature voltage-controlled oscillator (QVCO) which adopts a source-connection coupling structure. The QVCO simultaneously achieves low phase noise and low power consumption by newly combining current-reused VCOs and coupling transistors. The measured QVCO obtains good FoM of -188.2 dBc at a frequency of 2.2 GHz with 3.96 mW power consumption.
Jiongyao YE Yingtao HU Hongfeng DING Takahiro WATANABE
Power consumption has become an increasing concern in high performance microprocessor design. Especially, Instruction Cache (I-Cache) contributes a large portion of the total power consumption in a microprocessor, since it is a complex unit and is accessed very frequently. Several studies on low-power design have been presented for the power-efficient cache design. However, these techniques usually suffer from the restrictions in the traditional Instruction Fetch Unit (IFU) architectures where the fetch address needs to be sent to I-Cache once it is available. Therefore, work to reduce the power consumption is limited after the address generation and before starting an access. In this paper, we present a new power-aware IFU architecture, named Analysis Before Starting an Access (ABSA), which aims at maximizing the power efficiency of the low-power designs by eliminating the restrictions on those low-power designs of the traditional IFU. To achieve this goal, ABSA reorganizes the IFU pipeline and carefully assigns tasks for each stages so that sufficient time and information can be provided for the low-power techniques to maximize the power efficiency before starting an access. The proposed design is fully scalable and its cost is low. Compared to a conventional IFU design, simulation results show that ABSA saves about 30.3% fetch power consumption, on average. I-Cache employed by ABSA reduces both static and dynamic power consumptions about 85.63% and 66.92%, respectively. Meanwhile the performance degradation is only about 0.97%.
Kenji SUZUKI Mamoru UGAJIN Mitsuru HARADA
A micro-power active-RFID LSI with an all-digital RF-transmitting scheme achieves experimental 10-m-distance communication with a 1-Mbps data rate in the 300-MHz frequency band. The IC consists of an RF transmitter and a power supply circuit. The RF transmitter generates wireless signals without a crystal. The power supply circuit controls the energy flow from the battery to the IC and offers intermittent operation of the RF transmitter. The IC draws 1.6 µA from a 3.4-V supply and is implemented in a 0.2-µm CMOS process in an area of 1 mm2. The estimated lifetime of the IC is over ten years with a coin-size battery.
Toru SHIMIZU Kazutami ARIMOTO Osamu NISHII Sugako OTANI Hiroyuki KONDO
Various low power technologies have been developed and applied to LSIs from the point of device and circuit design. A lot more CPU cores as well as function IPs are integrated on a single chip LSI today. Therefore, not only the device and circuit low power technologies, but software power control technologies are becoming more important to reduce active power of application systems. This paper overviews the low power technologies and defines power management platform as a combination of hardware functions and software programming interface. This paper discusses importance of the power management platform and direction of its development.
Tianruo ZHANG Chen LIU Minghui WANG Satoshi GOTO
This paper proposes a region-of-interest (ROI) based H.264 encoder and the VLSI architecture of the ROI detection algorithm. In ROI based video coding system, pre-processing unit to detect ROI should only introduce low computational complexity overhead due to the low power requirement. The Macroblocks (MBs) in ROIs are detected sequentially in the same order of H.264 encoding to satisfy the MB level pipelining of ROI detector and H.264 encoder. ROI detection is performed in a novel estimation-and-verification process with an ROI contour template. Proposed architecture can be configured to detect either single ROI or multiple ROIs in each frame and the throughput of single detection mode is 5.5 times of multiple detection mode. 98.01% and 97.89% of MBs in ROIs can be detected in single and multiple detection modes respectively. Hardware cost of proposed architecture is only 4.68 k gates. Detection speed is 753 fps for CIF format video at the operation frequency of 200 MHz in multiple detection mode with power consumption of 0.47 mW. Compared with previous fast ROI detection algorithms for video coding application, the proposed architecture obtains more accurate and smaller ROI. Therefore, more efficient ROI based computation complexity and compression efficiency optimization can be implemented in H.264 encoder.
Dejun QIAN Zhe ZHANG Chen HU Xincun JI
Power-aware scheduling of periodic tasks in real-time systems has been extensively studied to save energy while still meeting the performance requirement. Many previous studies use the probability information of tasks' execution cycles to assist the scheduling. However, most of these approaches adopt heuristic algorithms to cope with realistic CPU models with discrete frequencies and cannot achieve the globally optimal solution. Sometimes they even show worse results than non-stochastic DVS schemes. This paper presents an optimal DVS scheme for frame-based real-time systems under realistic power models in which the processor provides only a limited number of speeds and no assumption is made on power/frequency relation. A suboptimal DVS scheme is also presented in this paper to work out a solution near enough to the optimal one with only polynomial time expense. Experiment results show that the proposed algorithm can save at most 40% more energy compared with previous ones.